Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites

نویسندگان

  • Min Sook Park
  • Zhe He
  • Zhiwei Chen
  • Sanghee Oh
  • Jiang Bian
چکیده

BACKGROUND The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE The aim of this study was to better understand consumers' usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). METHODS We collected 2 types of social media data: (1) a total of 3711 blogs tagged with "diabetes" on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)-the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. RESULTS We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as "Amino Acid, Peptide, or Protein," "Body Part, Organ, or Organ Component," and "Disease or Syndrome." CONCLUSIONS Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

Social Media and Youth

When examining young people’s experience of social media, it is useful to extend the notion of social media to appreciate not only the antecedents of some current youth online practices, but also the development of research concepts and frameworks related to this topic. For many researchers and media commentators the term social media refers principally, and narrowly, to the more communication ...

متن کامل

An Exploratory Study Regarding the Brand-Consumer Relationship in Social Media

In the digital era concepts such as social networks, blogs and forums have become integrated keywords in the marketing communication strategies of brands worldwide. The old ways of building an online presence, such as simple websites, have diminished most of their relevance in front of social media tools. The ease of establishing contacts with consumers made out of these new applications propit...

متن کامل

Precursor to the Arab Spring, Evidence from the Social Media

Using the computer to analyze a large-scale textual data from Tweeter and various Blog web sites over the period of 2007 2012, we study the differences of turmoils, which happen to Algeria, Bahrain, Egypt, Libya, Syria, Tunisia, and Yemen. We present visualizations serve as a tool to compare different countries of Arab spring, and to study them in chronological order to track the use of sentime...

متن کامل

Towards Discovery of Influence and Personality Traits through Social Link Prediction

Estimation of a person’s influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2016